My portfolio: an introduction

Column

When the course started, we were asked to choose a corpus. Important was to find something that allowed for meaningful comparisons and contrasts, so we could answer a specific research question. This gave me an interesting idea: since 2014, I have been keeping track of all the songs that I have listened to, using a website called Last FM. Every time a track is played, on media players like Spotify or iTunes for example, a “scrobble” is recorded. This way, I have scrobbled a total of 121,587 tracks (and counting!). What better corpus to choose than a corpus that contains a large part of all the music you have ever listened to? Although at the start of this course I had never worked with an API, and had just learned intermediate skills in R in my 3d year of Psychology, it sounded like an interesting challenge. So, I started googling.

Very soon after I found out that this might become a daunting task. Collecting all my scrobbles from the Last FM API wasn’t the hard part; combining over 100,000 songs with Spotify features however, that was something I was not capable of. Luckily, I found a guide written by Andrew Walker, a researcher from the University of Florida, that included detailed instructions on how to do exactly this. Fetching the features would take the longest of the code, he said, likely up to 10-15 minutes. Obviously, for a dataset as large as mine, that was a gross underestimation. When I got the code working, I cut up the fetching process into two parts, my dataset into 5 parts, and let it all run sequentially. 6 hours of long waiting later, it was finally there: all my scrobbles and corresponding Spotify features! From this point on I knew that analyzing my corpus could lead to some very interesting results.

In this portfolio, I will try to answer one main research question: How has time influenced my music listening? To answer this question, I will look at three different components:

  • Genre
  • The relationship between genre and features
  • A comparison between two songs, from 2015 and 2019

First, I will describe how I managed to group my songs into clusters. And, how I went from cluster to genre. Then, I will show the relationship between the years of my life, genre, and the Spotify features. From this, I will try to find out whether there is a relationship between Spotify features that changed over the years and genres that changed over the years. Then, I will use two songs that examplify two different parts of my life. One from 2015, when I was mostly still in high school, and one from 2019, when I was well on my way in my Bachelor’s. What can the information Spotify provides us on musical data tell me about myself?

2015

Column

Total song plays for the year 2015

Unique album plays for the year 2015

Unique song plays for the year 2015

In 2015, I was still in high school, and I was listening to a lot of Mac DeMarco. I put on his music and listened all his albums through, pretty much on repeat. I have never listened to an artist so much again, which is why the music I listened to in 2014 and 2015 is still at the top of my most played. Since then, I have started listening to a lot more different music, which can also be seen from my album plays in the gauges.

Column

Column

2016

Column

Total song plays for the year 2016

Unique album plays for the year 2016

Unique song plays for the year 2016

At the start of 2016, I was in the middle of my gap year. I was spending a lot of time playing guitar and listening to music, but I was still in the early phases of discovery. I took all this into my first year of studying, where I was still listening to a lot of the music I found in the year before.

Column

Column

2017

Column

Total song plays for the year 2017

Unique album plays for the year 2017

Unique track plays for the year 2017

For me, the year 2017 got off on a strange start. I had quit studying philosophy, I was living in Amsterdam, but I had no idea what direction my life was going in. This was the point that I felt that I needed to make some bigger steps. You can see this very clearly in my music listening: the amount of different albums I had listened to has nearly tripled! It will be interesting to see if we can also find some trends in the Spotify features from this year forward.

Column

Column

2018

Column

Total song plays for the year 2018

Unique album plays for the year 2018

Unique song plays for the year 2018

2018 marked the start of something new. I started studying Psychology, and I was beginning to listen to music on a whole new level. Since I had to spend hours studying in the library, I started listening to different music as well: Boards of Canada was one of my go-to artists for studying, and has slowly become one of my favorite artists.

Column

Column

2019

Column

Total song plays for the year 2019

Unique album plays for the year 2019

Unique song plays for the year 2019

It was 2019, and things started gaining traction. I was discovering more of my would-be favorite artists: I listened to Yo La Tengo and Boards of Canada before, and came across all sorts of different nineties bands I just couldn’t seem to get around. Suddenly I was finding all sorts of electronic music I liked, which can be seen in the genre chart.

Column

Column

All Years

Column

Total songs plays from 2015 to 2019

Unique album plays from 2015 to 2019

Unique song plays from 2015 to 2019

Over the year I have listened to a very large amount of music. I know my music listening habits very well, can we also see this reflected in the data?

Column

Column

Clustering

Top 100 artists cluster


So I had downloaded this big amount of data, but where to go from there? For a while I was trying to find a way to be able group the artists based on genre. Unfortunately, Spotify doesn’t provide genre information, and professor Burgoyne told us it would be hard. This sounded like a challenge of course, so I tried to think of ways to do it anyway. With the Last FM API at hand, I found out that it was possible to fetch “tags”. Tags are a way that Last FM lets users give common labels to artists, and usually they resemble their genre quite well. So, I went ahead and wrote a script to fetch the tags for my top 100 artists. This was the easy part; how did I go from here?

I had a matrix with all the artists, and 1’s for the tags they had and 0’s for the tags they didn’t have. I got stuck here for a while, Psychology unfortunately not providing me with a lot of help at this point. It wasn’t until the last week that a classmate pointed me towards k-means clustering. I didn’t believe it at first, but it was as simple as using my matrix as input and the k-means function. The plot on the left is the result.

The most obvious cluster is the seventies “classic rock” cluster in the top (cluster 3), containing artists like Pink Floyd and Steely Dan. This is the music it all started out with for me, and I still feel like I owe much to. The main genre I’ve listened to since I started using Last FM in 2014 is indie/ psychedelic rock. These are clusters 4 and 5 you see in the bottom-left, cluster 4 being “Lo-fi, psychedelic/ indie”, and cluster 5 “Dream pop, psychedelic indie”. The blue cluster 1 on the right is the “electronica/ downtempo” cluster. Unfortunately, the last cluster 2 doesn’t make a lot of sense. These are mostly outliers. For my analyses, I put some of these artists into other clusters.

Which artists are represented most?


The cluster plot looked very promising! Judging by the plot, the artists actually seemed to be far away enough from each other for the different clusters to make sense. Judging by my own opinion, not algorithmically determined but from hours of listening, I was very pleased to see these results.

I decided to choose 5 clusters, since using more would put artists into arbitrary categories. Using less would throw the nonsense cluster into the electronica/ downtempo cluster, which would not make sense, since these artists are not very related, and the electronica/ downtempo cluster is actually quite accurate. Two clusters for indie music might seem excessive, but these findings were actually quite robust, and knowing the artists I know that seperating Lo-fi rock from Dream pop would make sense. Also, if they were in a single cluster it would constitute too much of my entire corpus to be able to compare it to the other clusters.

To get an idea for how the clusters are represented in my corpus, on the left you can see each cluster, named by genre, and the artists that had the most plays for that cluster.

From cluster to genre


In this plot, you can see how much I have listened to each genre over the years. To get a valid measure, I calculated the proportion: taking the amount of listens for a genre divided by the total listens for that year. As you can see, there are some definite changes. From 2014 to mid 2015, my final years of high school: the lo-fi, psychedelic/indie genre is decreasing, and classic rock is increasing. When I got into my gap year, I discovered Steely Dan, and started listening a lot to Electric Light Orchestra as well, artists that are represented very well in the classic rock cluster as can be seen on the previous plot. After my gap year, into my first year of University, classic rock decreased, and a small peak in lo-fi, psychedelic/indie can be seen. Still being my main genre at this point, I was discovering a lot of new music, and leaving older music (a.k.a. classic rock) behind. My taste would soon start diverging though: after meeting a very special person that was very fond of downtempo music, I wanted to listen to it as well, a lot. She has inspired me to find and listen to a lot of new music. In my years 2018 and 2019 this can be seen very well, since suddenly there is steep rise in unique album listens.

In 2020, it seems my taste has crystallized to two clusters: electronic/ downtempo and lo-fi, psychedelic/indie. I listen to a lot of music in the other genres as well, but these are more for special occasions. Time will tell what differences in music taste are still waiting for me.

The next step: finding out what might explain these changes in genre.

Spotify features

Different genres…


So, my cluster analysis seems to provide some insight on how my music preferences have changed over time. Towards 2020, electronic/ downtempo seems to even have overtaken the lo-fi, psychedelic/ indie genre. Since genre describes an artist, and the features provided by Spotify describes songs by each artist, there could be some kind of relationship between them. To make a comparison we need to take a look: How have my Spotify features changed over time?

Different features…


My first plan when I had my data ready was to look at the Spotify features over time, to see if interesting patterns emerge. The plot you see on the left is the date, from 2014 to 2020, plotted against a selection of the features. From the data on my front page you could already tell that my music listening has changed. This is also reflected in the features.

The biggest change over time seems to be the instrumentalness. The acousticness seems to be increasing as well, and the energy is decreasing. How can this be explained?

and their relationship


In this plot, you can see the relationship between my two main genres, and the energy, instrumentalness, and acoustiness of the music I listen to. As the energy goes down, the amount of lo-fi/ indie music I listen to goes down. Conversely, the amount of electronica/ downtempo I listen to goes up. Also, as the instrumentalness and acousticness go up, electronica/ downtempo goes up, and lo-fi/ indie goes down.

From this, I can draw a general conclusion. My music taste, and also my preference has changed over the years. As a student, my life has become busier, a lot more fun, but also a lot more exhausting. The moments I actually sit down to listen to music are the moments I like to use to wind down.

So, there is a relationship between the change of these genres and the change of these features. What kind of relationship could this be?

Genres and features


Correlation isn’t causation, is a common phrase when people make unjust inferences. As you can see, the instrumentalness of the electronic/ downtempo genre is indeed much higher than for lo-fi indie/psychedelic. The increase in instrumentalness could therefore be ascribed to the increase in electronica/ downtempo over the years.

However, the energy and acousticness seem to be the same across these two genres. The energy of downtempo is a bit lower, but the difference is small. So, the decrease in energy and the increase in acousticness are not explained by the increase in electronica/ downtempo.

What can be seen though, is that the ambient/ classical cluster has much higher acousticness and much lower energy. What can be inferred from this?

What about energy?


So, we saw a trend of lower energy through the years. Apparently this was not just due to the fact that I was listening more to electronica/ downtempo. I wanted to visualize what might be the cause, and this plot seems to explain a lot. As you can see, for all the genres I listen to the energy seems to go down. What’s more, the energy of the ambient/ classical seems to have gone down by a huge margin. There is a possible explanation for this: the cluster consists mostly of outliers, so the group isn’t very accurate. What I do know however, is that the past year I have been listening to a lot more ambient and classical music to wind down. From the genre plot over the years it seemed like this wasn’t the case, since there was no increase over the years for this cluster. However, since this cluster is mostly a collection of outliers, random artists from earlier years are grouped with ambient and classical artists from the later years.

With this genre being very low energy and me listening to it more, and a general trend towards lower energy music for most of the other genres, the lower energy in 2019 compared to 2015 makes sense now.

The explanation for acousticness is the same: although the cluster ambient/classical has not increased much by the years, this is because it is grouped with artists that are not much related to this genre. If ambient classical were seperated, we would see an upwards trend by the later years. Also, there probably would not be such a steep drop in energy from 2015 to 2019 for this cluster.

2015 vs. 2019

Tempo


So, I concluded that the change in the features isn’t just caused by the change in genre over the years. It seems my music taste in general has changed. To demonstrate this, I will analyse some important modes in music. I will compare the years 2015 and 2019: the last period of high school, and the last period of my Bachelor Psychology.

For this plot, I used my top 15 songs from 2015 and 2019. Here you can see the mean tempo plotted against the SD of tempo, colour indicating tempo, size indicating song duration, and opacity indicating loudness. There seem to be quite large differences between 2015 and 2019. First of all, the range of tempo is much larger in 2019: it spans from ~70 to ~160, while in 2015 the tempo is clustered around 100. This indicates that in 2019, my music taste has become more varied, as we already saw on the first page. This can also be seen in song duration and loudness: in 2015 they seem to be similar, while in 2019 it seems to vary more.

What we also see, is that the tempo of the songs in 2019 is generally lower than in 2015. We already saw earlier that my tempo has decreased over the years, and my electronic/ downtempo music has a lower tempo, so these are likely related.

Interestingly, the standard deviation of tempo seems to increase with tempo. Does this mean that higher tempo songs also have a higher deviation in tempo? I have no clue. It could just be an error in Spotify’s analysis.

Two centre points


So there are differences between 2015 and 2019: lower tempo, and also lower energy (as we saw earlier). Based on this I decided to choose the two artists I listened to most in those years, and two songs that make good examples for these artists.For 2015 I choose Freaking Out the Neighbourhood by Mac DeMarco: a happy song that is quite upbeat. For 2019 I choose Hey Saturday Sun by Boards of Canada, a song that is calm and as its genre implies, quite downtempo.

In this plot, I compare these two artists. The energy is on the y-axis. As you can see, the energy for Boards of Canada is generally lower than for Mac DeMarco, which fits the observation that energy in my music has decreased over the years. The size and color of the dots represents loudness. For both artists, but for Boards of Canada especially, the lower energy songs also seem to have a lower loudness: the redder, larger dots are mostly near the lower end of the y-axis of energy.

In the next slides, I will look at the differences between these songs in multiple visualizations. From this, I will try to find out what the differences are between my low-energy music from 2019 and higher-energy music from 2015.

Chroma


It is obvious from the chromagram that these two songs are very different. Whereas the chroma features of the Mac DeMarco song are all over the place, the Boards of Canada song is very stable. Boards of Canada music is quite monotonous: there are not many different notes played throughout the song. The song has a steady beat with lots of repetition, which can be seen in the fact that the notes seem to repeat in similar patterns. Also, it is electronic music, so there aren’t many random noises produced by instruments. Mac DeMarco uses mostly a standard band-setup of guitar bass drums and vocals, and in the chroma you see that there are many more different tones, and a pattern isn’t easily recognizable.

Although we can’t infer much about the energy from the structure of the chromagram, you can tell that the Boards of Canada song is probably more boring: there is not a lot happening in the music. This is why it is perfect for me to listen to while studying, which is also the reason that I am listening to this type of music more than I did before. Perhaps it is the mere exposure that made me fall in love with the music in general.

Self-similarity


Again, we see obvious differences. It is clear that the Mac DeMarco song follows a classic pop-song pattern. For this song I used the pitch features, since that provided the clearest results. The song starts out with theme of the chorus, followed by a verse, chorus, verse, and an outro that also contains the theme of the chorus. The yellow part at the end is an outro that only appears once in the song.

The self-similarity matrix of the Boards of Canada song also contains clear patterns. For this song I used the timbre features for the best result. Throughout the song, the same guitar riff is played. A couple of times, the riff is changed for a strumming guitar with synth-sounds on the side. The self-similarity matrix tells us that a lot of parts in this song are very similar: the guitar is constant, the synth is constant, and the beat is constant. The first section is yellow over time: it contains a small intro that is not repeated throughout the song.

Comparing the two songs, the Mac DeMarco similarity panes are much smaller. This indicates that although some short parts are the same, there is also a lot of variation (change in pattern). The Boards of Canada songs has many large blue panes, that indicate that larger parts of the song repeat and are similar. Combined with the low BPM, this tells us some of the ways that this track might be lower-energy.

Tempo: Boards of Canada


Finally, a tempogram of the Boards of Canada song. It seems like the tempo would be around 120, but listening to the song, this is incorrect. The beat is doubled: the beat is actually around 60 BPM. This shows why beat analysis could be problematic. I have noticed, for a lot of downtempo songs, on BPM recognition sites the BPM is sometimes doubled or even tripled. The kick is slow, but there are also drum sounds in between the kicks. These sounds are probably picked up as beats, and result in an erroneous estimation of the BPM. As can be seen though, the beat is very steady throughout the song. Since the beat is so low, to me it seems like this fits the low energy of this type of music very well.

Final thoughts

Column

When I started the course, I didn’t quite know where it would bring me. When we tried working with the Spotify API in the first class, and noticed how you could get so much interesting information with just a few lines of code, I was very tempted to take it a step further. I have spent a lot of time analysing my data, and in the process, I have make huge steps in coding. The datacamp assignments gave me inspiration, but everything else I had to look up myself. Although this has taken me a lot (really, a lot) of time, I have gained so much coding experience in the process, that I am very happy with where I am right now.

Now the most important part: the music. My musical data was very interesting to me, since it was so personal. Delving deeper into the information that could be fetched from the Spotify API, I feel like I have learned a lot more about my music. But more importantly, I have learned a lot more about musicology. Before this course, I had never looked at chrome features, or self-similarity matrices, and I had never even thought of how timbre could be analysed. Applying this to my own music that I know so well has taught me a lot.

Datawise, the main takeaways for me are:

  • I am now listening to more different music than I have before
  • I am exploring more genres than I did before
  • From the Spotify features, it seems the music I listen to has become much calmer, and more instrumental and acoustic over the years
  • and, as could be seen from the chroma features and self-similarity matrices:
    • My low energy music has lower loudness, is more monotonous, and contains less variation than higher energy music

…and this seems to fit the changes in my life quite well. I am studying more, and after having a busy day I like to listen to music to wind down. This can be visualized with the Spotify features quite well, so in a way, Spotify knows more about a person than you’d think.